Search CORE

789 research outputs found

Efficient Large-scale Trace Checking Using MapReduce

Author: Barre B.
Bartocci E.
Basin D.
Bianculli D.
Coen-Porisini A.
Ho H.-M.
Mrad A.
Zaharia M.
Zaharia M.
Publication venue
Publication date: 26/08/2015
Field of study

The problem of checking a logged event trace against a temporal logic specification arises in many practical cases. Unfortunately, known algorithms for an expressive logic like MTL (Metric Temporal Logic) do not scale with respect to two crucial dimensions: the length of the trace and the size of the time interval for which logged events must be buffered to check satisfaction of the specification. The former issue can be addressed by distributed and parallel trace checking algorithms that can take advantage of modern cloud computing and programming frameworks like MapReduce. Still, the latter issue remains open with current state-of-the-art approaches. In this paper we address this memory scalability issue by proposing a new semantics for MTL, called lazy semantics. This semantics can evaluate temporal formulae and boolean combinations of temporal-only formulae at any arbitrary time instant. We prove that lazy semantics is more expressive than standard point-based semantics and that it can be used as a basis for a correct parametric decomposition of any MTL formula into an equivalent one with smaller, bounded time intervals. We use lazy semantics to extend our previous distributed trace checking algorithm for MTL. We evaluate the proposed algorithm in terms of memory scalability and time/memory tradeoffs.Comment: 13 pages, 8 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Open Repository and Bibliography - Luxembourg

Computing Web-scale Topic Models using an Asynchronous Parameter Server

Author: Asuncion A.
Hofmann T.
Yu H.-F.
Zaharia M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Topic models such as Latent Dirichlet Allocation (LDA) have been widely used in information retrieval for tasks ranging from smoothing and feedback methods to tools for exploratory search and discovery. However, classical methods for inferring topic models do not scale up to the massive size of today's publicly available Web-scale data sets. The state-of-the-art approaches rely on custom strategies, implementations and hardware to facilitate their asynchronous, communication-intensive workloads. We present APS-LDA, which integrates state-of-the-art topic modeling with cluster computing frameworks such as Spark using a novel asynchronous parameter server. Advantages of this integration include convenient usage of existing data processing pipelines and eliminating the need for disk writes as data can be kept in memory from start to finish. Our goal is not to outperform highly customized implementations, but to propose a general high-performance topic modeling framework that can easily be used in today's data processing pipelines. We compare APS-LDA to the existing Spark LDA implementations and show that our system can, on a 480-core cluster, process up to 135 times more data and 10 times more topics without sacrificing model quality.Comment: To appear in SIGIR 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

Adaptive multiagent system for seismic emergency management

Author: Florin LEON
Gabriela M. ATANASIU
Mihai Horia ZAHARIA
Publication venue
Publication date
Field of study

Presently, most multiagent frameworks are typically programmed in Java. Since the JADE platform has been recently ported to .NET, we used it to create an adaptive multiagent system where the knowledge base of the agents is managed using the CLIPS language, also called from .NET. The multiagent system is applied to create seismic risk scenarios, simulations of emergency situations, in which different parties, modeled as adaptive agents, interact and cooperate.adaptive systems, risk management, seisms.

Research Papers in Economics

Design patterns for multi-agent simulations

Author: Florin Leon
Gabriela M. Atanasiu
Mihai Horia Zaharia
Stefan Boronea
Publication venue
Publication date
Field of study

The advent of mobile agent technology has brought along a few difficulties in designing a stable, efficient and scalable system for a certain problem. Agent-based simulations prove to be powerful tools for economic analyses. In this paper we aim at describing a set of design patterns which were specifically built for agents and multi-agent systems. The details of each design pattern discussed are presented and the possible applications and known issues are noted. In order to aid the software designers, we provide some examples of the basic implementation of these patterns using the JADE multi-agent framework.intelligent agent, multi-agent design, multi-agent simulation.

Research Papers in Economics

Asymptotically Optimal Approximation Algorithms for Coflow Scheduling

Author: Ahuja R. K.
Al-Fares M.
Al-Fares Mohammad
Peis B.
Zaharia M.
Zhao Y.
Publication venue
Publication date: 08/03/2018
Field of study

Many modern datacenter applications involve large-scale computations composed of multiple data flows that need to be completed over a shared set of distributed resources. Such a computation completes when all of its flows complete. A useful abstraction for modeling such scenarios is a {\em coflow}, which is a collection of flows (e.g., tasks, packets, data transmissions) that all share the same performance goal. In this paper, we present the first approximation algorithms for scheduling coflows over general network topologies with the objective of minimizing total weighted completion time. We consider two different models for coflows based on the nature of individual flows: circuits, and packets. We design constant-factor polynomial-time approximation algorithms for scheduling packet-based coflows with or without given flow paths, and circuit-based coflows with given flow paths. Furthermore, we give an

O(\log n/\log \log n)

-approximation polynomial time algorithm for scheduling circuit-based coflows where flow paths are not given (here

n

is the number of network edges). We obtain our results by developing a general framework for coflow schedules, based on interval-indexed linear programs, which may extend to other coflow models and objective functions and may also yield improved approximation bounds for specific network scenarios. We also present an experimental evaluation of our approach for circuit-based coflows that show a performance improvement of at least 22% on average over competing heuristics.Comment: Fixed minor typo

arXiv.org e-Print Archive

Crossref

On data skewness, stragglers, and MapReduce progress indicators

Author: Chambers J. M.
Dai J.
Gufler B.
Herodotou H.
Herodotou H.
Li J.
Ousterhout K.
Zaharia M.
Publication venue
Publication date: 01/01/2015
Field of study

We tackle the problem of predicting the performance of MapReduce applications, designing accurate progress indicators that keep programmers informed on the percentage of completed computation time during the execution of a job. Through extensive experiments, we show that state-of-the-art progress indicators (including the one provided by Hadoop) can be seriously harmed by data skewness, load unbalancing, and straggling tasks. This is mainly due to their implicit assumption that the running time depends linearly on the input size. We thus design a novel profile-guided progress indicator, called NearestFit, that operates without the linear hypothesis assumption and exploits a careful combination of nearest neighbor regression and statistical curve fitting techniques. Our theoretical progress model requires fine-grained profile data, that can be very difficult to manage in practice. To overcome this issue, we resort to computing accurate approximations for some of the quantities used in our model through space- and time-efficient data streaming algorithms. We implemented NearestFit on top of Hadoop 2.6.0. An extensive empirical assessment over the Amazon EC2 platform on a variety of real-world benchmarks shows that NearestFit is practical w.r.t. space and time overheads and that its accuracy is generally very good, even in scenarios where competitors incur non-negligible errors and wide prediction fluctuations. Overall, NearestFit significantly improves the current state-of-art on progress analysis for MapReduce

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Archivio della ricerca- Università di Roma La Sapienza

Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially.

Author: Aghamolaei S.
Awasthi P.
Ceccarello M.
Charikar M.
Henzinger M.
Malkomes G.
Mikolov T.
Mitzemacher M.
Munro J.
Zaharia M.
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2019
Field of study

Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular k-center variant which, given a set S of points from some metric space and a parameter k0, the algorithms yield solutions whose approximation ratios are a mere additive term \u3f5 away from those achievable by the best known polynomial-time sequential algorithms, a result that substantially improves upon the state of the art. Our algorithms are rather simple and adapt to the intrinsic complexity of the dataset, captured by the doubling dimension D of the metric space. Specifically, our analysis shows that the algorithms become very space-efficient for the important case of small (constant) D. These theoretical results are complemented with a set of experiments on real-world and synthetic datasets of up to over a billion points, which show that our algorithms yield better quality solutions over the state of the art while featuring excellent scalability, and that they also lend themselves to sequential implementations much faster than existing ones

Crossref

The IT University of Copenhagen's Repository

Archivio istituzionale della ricerca - Università di Padova

GraphSE $^2$ : An Encrypted Graph Database for Privacy-Preserving Social Search

Author: Beaver D.
Chi Y.
Papadimitriou A.
Poddar R.
Slee M.
Xie D.
Yao A.C.
Zaharia M.
Zhang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/05/2019
Field of study

In this paper, we propose GraphSE

^2

, an encrypted graph database for online social network services to address massive data breaches. GraphSE

^2

preserves the functionality of social search, a key enabler for quality social network services, where social search queries are conducted on a large-scale social graph and meanwhile perform set and computational operations on user-generated contents. To enable efficient privacy-preserving social search, GraphSE

^2

provides an encrypted structural data model to facilitate parallel and encrypted graph data access. It is also designed to decompose complex social search queries into atomic operations and realise them via interchangeable protocols in a fast and scalable manner. We build GraphSE

^2

with various queries supported in the Facebook graph search engine and implement a full-fledged prototype. Extensive evaluations on Azure Cloud demonstrate that GraphSE

^2

is practical for querying a social graph with a million of users.Comment: This is the full version of our AsiaCCS paper "GraphSE

^2

: An Encrypted Graph Database for Privacy-Preserving Social Search". It includes the security proof of the proposed scheme. If you want to cite our work, please cite the conference version of i

arXiv.org e-Print Archive

Crossref